Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/tv4] Fix tv4 extraction #5649

Merged
merged 3 commits into from
Jun 15, 2023
Merged

[extractor/tv4] Fix tv4 extraction #5649

merged 3 commits into from
Jun 15, 2023

Conversation

TxI5
Copy link
Contributor

@TxI5 TxI5 commented Nov 26, 2022

changed json api url

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

TV4 has changed their api domain. This changes the url for json metadata extraction

Fixes #5535

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

changed json api url
@TxI5
Copy link
Contributor Author

TxI5 commented Nov 26, 2022

Closes #5535

@TxI5 TxI5 changed the title Update tv4.py [extractor/tv4] Fix tv4 extraction Nov 26, 2022
@dirkf

This comment was marked as outdated.

@bashonly bashonly added geo-blocked Content is geo-blocked site-bug Issue with a specific website labels Dec 5, 2022
@@ -73,7 +73,7 @@ def _real_extract(self, url):
video_id = self._match_id(url)

info = self._download_json(
'https://playback-api.b17g.net/asset/%s' % video_id,
'https://playback2.a2d.tv/asset/%s' % video_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use f-strings for both API URLs for consistency / readability

Suggested change
'https://playback2.a2d.tv/asset/%s' % video_id,
f'https://playback2.a2d.tv/asset/{video_id}',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely make them consistent, but if it's done using f-strings yt-dl will just have to change it to any of the other myriad string syntaxes (I know that's our burden).

Maybe s/t like this:

    def _call_api(self, endpoint, video_id, query=None, **kwargs):
        return self._download_json(
            '/'.join(('https://playback2.a2d.tv', endpoint, video_id)),
            video_id, endpoint.join(('Downloading video ', ' JSON')),
            query=update_url_query({
                'service': 'tv4',
                'device': 'browser',
                'protocol': 'hls,dash',
            }, query), **kwargs)
...
        info = self._call_api('asset', video_id, query={
                'drm': 'widevine',
            })['metadata']
...
        manifest_url = self._call_api('play', video_id, expected_status=401)
        # don't crash during error handling
        err = traverse_obj(manifest_url, 'errorCode')
...
        # then as in https://github.com/yt-dlp/yt-dlp/pull/5649#issuecomment-1337847672 below

Related to the geo-restriction handling, in the class:

    # XFF is not effective
    _GEO_BYPASS = False

and perhaps specialise the msg for raise_geo_restricted() if, as I believe, a logged-in user can access the site from outside SE/EU:

                self.raise_geo_restricted(
                    'This video is not available from your location due to geo restriction, or not being authenticated.',
                    countries=['SE'])

Copy link
Member

@pukkandan pukkandan Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely make them consistent, but if it's done using f-strings yt-dl will just have to change it to any of the other myriad string syntaxes (I know that's our burden).

Sorry, but if we discourage the use of newer Python features, then there was no point in deprecating older versions. In most cases (including here) fstrings simply read much better than the alternatives.

You could encourage OP to make PR against youtube-dl and get it merged there first - which would reverse our responsibilities; i.e. the burden of code review will then mainly fall on you, and the burden of porting (dealing with deprecated imports/functions) on me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but in this case it is at least possible to have just the one instance, and also avoid replicating the base API URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I wasn't commenting on the whole thing, just on the quoted part. Having a _call_api does look good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -84,7 +84,7 @@ def _real_extract(self, url):
title = info['title']

manifest_url = self._download_json(
'https://playback-api.b17g.net/media/' + video_id,
'https://playback2.a2d.tv/play/' + video_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'https://playback2.a2d.tv/play/' + video_id,
f'https://playback2.a2d.tv/play/{video_id}',

@bashonly
Copy link
Member

bashonly commented Dec 5, 2022

Although not related to the original issue, the geo-restriction processing could be also improved. XFF doesn't work for this site.

I've applied some yt-dl test code that could be adapted:

This is a good idea, since it seems like all tv4 content is geo-restricted now. But the diff posted above conflates the asset endpoint with the play endpoint.

A working version could look like this (using this PR's initial commit as baseline for the diff):

-        manifest_url = self._download_json(
-            'https://playback2.a2d.tv/play/' + video_id,
-            video_id, query={
+        manifest_info = self._download_json(
+            f'https://playback2.a2d.tv/play/{video_id}',
+            video_id, 'Downloading manifest info JSON', query={
                 'service': 'tv4',
                 'device': 'browser',
                 'protocol': 'hls',
-            })['playbackItem']['manifestUrl']
+            }, expected_status=401)
+        err = manifest_info.get('errorCode')
+        if err:
+            msg = manifest_info.get('message') or err
+            if 'GEO_LOCATION' in err:
+                self.raise_geo_restricted(msg, countries=['SE'])
+            raise ExtractorError(f'HTTP Error 401: Unauthorized; {msg}', video_id=video_id)
+
+        manifest_url = manifest_info['playbackItem']['manifestUrl']
+

@pukkandan
Copy link
Member

Does this still work? Pls update tests

@pukkandan pukkandan added the needs-testing Patch needs testing label Feb 3, 2023
@@ -84,7 +84,7 @@ def _real_extract(self, url):
title = info['title']

manifest_url = self._download_json(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step (manifest download) requires geo-verification outside Sweden. The following update would solve that part:

manifest_url = self._download_json(
    'https://playback2.a2d.tv/play/%s' % video_id,
    video_id, query={
        'service': 'tv4',
        'device': 'browser',
        'browser': 'GoogleChrome',
        'protocol': 'hls,dash',
        'drm': 'widevine',
        'capabilities': 'live-drm-adstitch-2,expired_assets',
    },
    headers=self.geo_verification_headers(),
)['playbackItem']['manifestUrl']

@pukkandan pukkandan force-pushed the master branch 2 times, most recently from ee280c7 to 7aeda6c Compare May 24, 2023 18:09
@bashonly bashonly added pending-review PR needs a review and removed needs-testing Patch needs testing labels Jun 14, 2023
@@ -20,19 +23,25 @@ class TV4IE(InfoExtractor):
sport/|
)
)(?P<id>[0-9]+)'''
_GEO_COUNTRIES = ['SE']
_GEO_BYPASS = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per dirkf's comment here #5649 (comment) that xff is futile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why r we passing go verification headers? Am I misunderstanding something?

Copy link
Member

@bashonly bashonly Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_GEO_BYPASS is for --xff, which has no effect
geo_verification_headers() are for --geo-verification-proxy, which does work

@bashonly bashonly merged commit 125ffaa into yt-dlp:master Jun 15, 2023
@bashonly bashonly removed the pending-review PR needs a review label Jun 15, 2023
@TxI5 TxI5 deleted the tv4 branch June 16, 2023 22:18
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
geo-blocked Content is geo-blocked site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TV4] HTTP Error 503: Service Unavailable
5 participants