Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed APIv3: initial implementation #8319

Merged
merged 49 commits into from
Sep 21, 2021
Merged
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
61554ac
Move `clean_links` to embed/utils.py
humitos Jul 5, 2021
311ada4
`clean_links` response HTML in raw instead of a PyQuery object
humitos Jul 6, 2021
76d0bf5
Implement the minimal version of the contract
humitos Jul 6, 2021
e841b98
Add docs.sympy.org to the allowed domains
humitos Jul 6, 2021
78f6656
Implement doctool= usage for Sphinx `dt` known cases
humitos Jul 6, 2021
fe3a507
Use timeout= when requesting external pages
humitos Jul 6, 2021
28e4e83
Handle requests TooManyRedirects and other errors
humitos Jul 6, 2021
71b9e2d
Use cache-keys and djangos settings for timeouts
humitos Jul 7, 2021
1aa70e3
More logs and comments
humitos Jul 7, 2021
5bab780
Remove one unneeded conditional
humitos Jul 7, 2021
59fe0f1
Return fragment=null if there is no fragment
humitos Jul 7, 2021
517d3e8
Use setting for request.get `timeout=` argument
humitos Jul 12, 2021
562c260
Log exception to track wrong URLs in Sentry
humitos Jul 12, 2021
73e2ee3
Sanitize the URL inside `_download_page_content`
humitos Jul 12, 2021
854a44b
Handle malformed URLs (not netloc or scheme)
humitos Jul 12, 2021
49e02d5
Use for/else syntax sugar instead of a temporary variable
humitos Jul 12, 2021
be32bd3
Call `clean_links` before creating the response
humitos Jul 12, 2021
cc2227c
Do not depend on impicit state: pass the required arguments
humitos Jul 12, 2021
1a72c07
Don't return metadata (project, version, language, path)
humitos Jul 12, 2021
7b6b493
Update readthedocs/embed/v3/views.py
humitos Jul 15, 2021
418d9ac
Improve the response http status codes
humitos Jul 19, 2021
8a77043
Sanitize URL before passing it to `clean_liniks`
humitos Jul 19, 2021
d9ca50e
Comment to sanitize `cache_key` by URL
humitos Jul 19, 2021
3be9b8e
Update import for `clean_links` in tests
humitos Jul 19, 2021
a27122a
Do not call selectolax if there is no content
humitos Jul 19, 2021
84c3d18
Check if the domain is valid before calling `unresolver`
humitos Jul 19, 2021
7adccb2
Remove tedius warnings from pytest
humitos Jul 19, 2021
a1cf45a
Initial test suite for EmbedAPI v3
humitos Jul 19, 2021
ec2fd5f
Add `doctoolwriter` to allow `html4` and `html5` on Sphinx
humitos Aug 16, 2021
f05da3f
Run EmbedAPIv3 test on a different tox environment
humitos Aug 16, 2021
7f3e3a9
Fix tests with proper error message
humitos Aug 16, 2021
bcfba37
Run tests-embedapi in CircleCI
humitos Aug 16, 2021
905cbcf
Consider docutils 0.16 and 0.17 when checking HTML output
humitos Aug 16, 2021
b0dc81f
Revert "Fix tests with proper error message"
humitos Aug 16, 2021
9810f89
Revert "Add `doctoolwriter` to allow `html4` and `html5` on Sphinx"
humitos Aug 16, 2021
05936d3
Lint
humitos Aug 16, 2021
333c892
Disable unused-argument for now
humitos Aug 16, 2021
c4751c7
Make test for sphinxcontrib-bibtex to pass
humitos Aug 18, 2021
7dd4b4f
Auto-delete _build directory after test run
humitos Aug 18, 2021
d134ab9
Checks that depend on Sphinx version (3.5)
humitos Aug 18, 2021
4b5f2d6
Don't make doctoolversion= attribute mandatory when passing doctool=
humitos Aug 18, 2021
0056e90
Sphinx 3.5 seems to be different on its HTML
humitos Aug 18, 2021
cf3da8a
Lint
humitos Aug 18, 2021
aeaad76
Fragment case changed on 3.0.0
humitos Aug 18, 2021
ef5a14c
Don't run EmbedAPIv3 tests by default
humitos Aug 18, 2021
160e51b
Sphinx 3.5 adds an <span> on <dl>
humitos Aug 18, 2021
cc76893
Update readthedocs/embed/v3/tests/test_external_pages.py
humitos Sep 9, 2021
5e25c47
Log url= together with fragment=
humitos Sep 21, 2021
7edb272
Merge branch 'master' of github.com:readthedocs/readthedocs.org into …
humitos Sep 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 36 additions & 16 deletions readthedocs/embed/v3/views.py
Original file line number Diff line number Diff line change
@@ -68,15 +68,7 @@ def _download_page_content(self, url):
log.debug('Cached response. url=%s', url)
return cached_response

try:
response = requests.get(url, timeout=settings.RTD_EMBED_API_DEFAULT_REQUEST_TIMEOUT)
except requests.exceptions.TooManyRedirects:
log.exception('Too many redirects. url=%s', url)
return
except Exception: # noqa
log.exception('There was an error reading the URL requested. url=%s', url)
return

response = requests.get(url, timeout=settings.RTD_EMBED_API_DEFAULT_REQUEST_TIMEOUT)
if response.ok:
cache.set(
cache_key,
@@ -286,13 +278,41 @@ def get(self, request):
# whitespaces (spaces, tabs, etc.).
fragment = parsed_url.fragment

content_requested = self._get_content_by_fragment(
url,
fragment,
external,
doctool,
doctoolversion,
)
try:
content_requested = self._get_content_by_fragment(
url,
fragment,
external,
doctool,
doctoolversion,
)
except requests.exceptions.TooManyRedirects:
log.exception('Too many redirects. url=%s', url)
return Response(
{
'error': (
'The URL requested generates too many redirects. '
f'url={url}'
)
},
# TODO: review these status codes to find out which on is better here
# 400 Bad Request
# 502 Bad Gateway
# 503 Service Unavailable
status=status.HTTP_400_BAD_REQUEST,
)
except Exception: # noqa
log.exception('There was an error reading the URL requested. url=%s', url)
return Response(
{
'error': (
'There was an error reading the URL requested. '
f'url={url}'
)
},
status=status.HTTP_400_BAD_REQUEST,
)

if not content_requested:
log.warning('Identifier not found. url=%s', url)
return Response(