Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc: Embed APIv3 #8222
Design doc: Embed APIv3 #8222
Changes from 13 commits
9d4ebe5
fe633ec
66d839e
06c8a97
7448158
1d3c097
1d43423
81edc9c
89a5795
4efd3ec
adebe39
e307563
f783339
efe5abc
145b6e8
5063bbc
84c7ce0
3b320e8
b34fce0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still were discussing this. I don't think we can support it without having to handle specific cases, not all returned html will have
p
tags or the word will end in a valid tag, we also need to skip tags from the lenghtHaving the client using css to "hide" long texts is more easy to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example you shared would return:
?nwords=1
:<p>Cut</p>
?nwords=2
:<p>Cut <strong>here?</strong></p>
?nparagraphs=1
:<p>Cut <strong>here?</strong></p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but the problem is calculating the words and taking the tags into account, you'll be removing tags from the original html, possible breaking some styles like with tables/definition lists. And also how would you know if there is a paragraph when you have content surrounded in other tags like lists or divs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been playing with this already in https://github.com/readthedocs/readthedocs-ext/pull/304/. How to do it is an implementation detail and I'm sure there is going to be some problems we will need to solve. However, I don't think it's impossible.
I've been playing a little with BeautifulSoup already and this seems to work close enough:
Click to see the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about returning this information on the same endpoint, is confusing returning the content of a section plus the list of sections, we should have a different endpoint to return data about the page itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand here why it may cause confusion and how having a different endpoint to only show the available
id=
for a page is better? I think I'm not against a different endpoint, but I want to understand your position better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you are requesting information for a section, but getting information about the page too, you don't want to have to query a random section just to get information about the page. This also allow us to return more information about the page itself like the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @agjohnson here since he wrote:
in #7117
@agjohnson can you take a look at this proposal and suggest that would be the ideal response to you and how you would like to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agjohnson returning headers may not be useful if those headers don't have an
id=
that we can find. Also, I realized that it's not trivial to "find the title for a particularid=
" (we have this hack currently implemented that only works overa
tags. Theid=
could be in thediv
surrounding theh1
tag or thedl
tag or any other.For this case, we could get the
.next()
from theid=configuration
. However, we can't guarantee that this will always work. Even if we get it, we will want to remove the trailing¶
from its name as well in this case (but it could be a different char)This one is similar, but the
.next()
element is exactly the title we are looking for. It does not contain the¶
char.However, if the title we want is exactly on the
h1
ordl
(instead of their child), we will fail to detect it and we would return something invalid.So, I think it's still useful to return all the available
id=
s from the page so developers can explore them, but it's not easy (maybe impossible) to know the exact title for that particularid=
and the exact hierarchy as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to a particular endpoint
/api/v3/embed/identifiers/
and define the initial response: "return all the available identifiers for a specific page".We can expand it later if we need something else and there is a good way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked more here, and the use case or feature that I'm describing is currently in a strange place.
More explicitly, I require a list of headings -- basically the toctree on a document -- with heading text, URL to link to the heading, and the heading nesting level or heading nesting as a data structure. This gives me what I need to basically expose the toctree in our application. This is the feature that I had wanted to expose for commercial use -- embedding documentation metadata/headings/etc in customer applications. A hoverxref type extension might be useful in addition here, but is separate. Customers would still need to get metadata out of a particular document in order to inject a toctree into an application view.
So, where I'm at is that I feel like this feature is more of a relic of where the embed API started and it is dragging the direction of the embed API down -- however this might just be my interpretation. The embed API is more focused on having generic support, and so therefore parsing HTML, and what I want is basically exposing the contents of
objects.inv
. I could be talking about a separate feature, or we could still be talking about keeping this as a separate API endpoint.I am going to research this a bit more, I may be talking about a separate feature entirely at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects.inv
is already indexed & exposed w/ the SphinxDomain modeling, isn't it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, so we talked about exposing a separate API for that instead.
We need to decide if this feature is going to be a separate product and how users will interact with it. I think we'd be maintaining 1 client JS library that does both embed of hover cards and injection of document header/refs from Sphinx, as the use case is similar.
@humitos Did we talk about what mechanism we'd use if this function was a subpath on the embed API? Are we using Sphinx refs via the SphinxDomain modeling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we didn't talk about this and I haven't thought too much either. I usually forget that we have
SphinxDomain
model --and I haven't used it yet.Something like
/api/v3/embed/headers/
may work for this use case. However, we have to keep in mind that:SphinxDomain
model?doctool=mkdocs&version=1.0.1
and parse a known HTML structureThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a better solution than doing it for all requests, and gives us a way to deprecate things 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are specifying deprecation in this document, then the use case for embedding Sphinx refs should also be mentioned here. Deprecation should depend on surfacing an API to expose documentation refs. We need to have a discussion about this end point and how it is implemented/etc next, but that can be separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a good question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to switch the version and API name so we have the version after its name? That way we can release a new version of a particular API without touching the others.
/api/search/v3/
/api/footer/v3/
/api/resources/v2/
and/api/resources/v3/
/api/embed/v3/
cc @stsewd @ericholscher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That naming feels weird to me, I think we just need to document which endpoints are part of what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stsewd can you expand how that would be and give some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already done in part for search for example https://docs.readthedocs.io/en/stable/server-side-search.html#api we just need to do the same with the embed api and footer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stsewd I don't follow you. That search endpoint is under
/api/v2/
--so we are not differentiating the "Feature APIs" from the "Resources API" there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make this distinction in our docs, not in the URLs