-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bbc] Fix BBCCoUkIPlayerPlaylistIE #28360
Conversation
do mix multiple changes in a single PR. |
c2c0fd4
to
4863404
Compare
Allow white space between tags as now sent by BBC.
4863404
to
dc43160
Compare
|
The current version should deal with your comments, but the download tests aren't yet validated: CI is failing because |
as i was looking at this, i think there is a better way to make an optimized version that would reduce the number of requests and still support all the expected features in addition to extracting more metadata for |
Feel free to use the test URLs from this PR for that. See also PR #23438 for a previous solution and set of test cases. Generally, even though the BBC site support is split into different extractors, the BBCIE extractor in particular implements multiple extraction tactics and it's difficult to know which of these are still valid -- perhaps some telemetry would help, as well as refactoring the over-sized _real_extract() method. It seems that two types of tactics are currently valid:
Maybe generic methods can be created to support these tactics? I have further updates to fix some of the failing test cases but will hold off if you're reimplementing. |
6920932
to
a5b6df4
Compare
The playlist metadata is now sent in a JSON expression within a <script> element. Save the initial playlist_id
the main BBC Extractor has to be split in the future, this can be done by slowly extracting code that is related to a specific URL pattern into it's own extractor, it would certainly make the code more manageable.
it always depend on whether there is a better way of doing thing, in the case of BBC Reel code, even thought i would preferred to introduce the code in a different way, but as the code can be reused in the future, i did accept it with only small changes, but in this case, the method that i added is quite different from the code here, so there aren't much code that can be reused(except for the tests). |
The changes I mentioned aren't specific to the iPlayer extractor (fixes to tests, morph playlists, support simorgh playlists). Regarding the new code, please see #15710 (comment) for a case where what started out as an iPlayer URL redirected to a |
* https://github.com/ytdl-org/youtube-dl: [ard] improve clip id extraction(ytdl-org#22724)(closes ytdl-org#28528) release 2021.03.25 [ChangeLog] Actualize [ci skip] [zoom] Add new extractor(closes ytdl-org#16597, closes ytdl-org#27002, closes ytdl-org#28531) [extractor] escape forgotten dot for hostnames in regular expression (ytdl-org#28530) [bbc] fix BBC IPlayer Episodes/Group extraction(closes ytdl-org#28360) [youtube] Fix default value for youtube_include_dash_manifest (closes ytdl-org#28523)
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
The
BBCCoUkIPlayerPlaylistIE
extractor was failing, whereasBBCCoUkPlaylistIE
still worked for the same pids: eg, the_TESTS
.In the iPlayer series pages, the playlist metadata is now sent in a JSON expression within a <script> element, and this patch extracts the
title
anddescription
from it, and modifies the_VIDEO_ID_TEMPLATE
to match the episode links within theentities
subobject of the JSON.Also, the regex used to get the next page link for
BBCCoUkPlaylistBaseIE
didn't allow for a newline that is now sent between tags.For multi-series iPlayer playlists, each series page is also processed.
For multi-page iPlayer playlists, the pagination regex is extended to match the "Next Page" link on these pages.