Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Use the new internal API in NewPipe Extractor #604

Merged
merged 55 commits into from
Aug 3, 2021

Conversation

AudricV
Copy link
Member

@AudricV AudricV commented Apr 9, 2021

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API. - Nothing should be changed in the app side with these changes in the extractor.

Use the new internal API called innertube or youtubei (https://www.youtube.com/youtubei/v1/endpoint?key=INNERTUBE_API_KEY) to fetch informations of YouTube contents instead of using the pbj JSON (https://www.youtube.com/webpage_endpoint?pbj=1; for YouTube Music, nothing were changed because the search already uses the InnerTube API). Responses are pretty similar (most of time, the order of the objects is just changed), so this should not be a big work like it was in 2020 for the migration from the old HTML YouTube pages to the desktop polymer version and its pbj JSON. This pbj seems to be deprecated, the desktop website if YouTube is only using this API for video comments right now (there are A/B tests with the next endpoint right now).

The changes in this PR needs testing for exceptions due to a big traffic and if the API returns 429/Too Many Requests, support of this in the extractor needs to be check (maybe sending cookies generated by a captcha to this request on high network traffics should bypass this error code).

Extraction of comments is fixed with this PR, like the extraction of embeddable age-restricted videos.

Improvements made in this PR:

  • before this PR, in order to get continuations of a channel, the initial page was always fetched. That's not the case anymore with a workaround using the ids field of the Page class, see e0011de
  • use a lightweight request to check if hardcoded versions are valid:
    • for YouTube, use the guide endpoint, which returns the menu items of the website
    • for YouTube music, use the get_search_suggestions endpoint with an empty string suggestion, used by the website when loading it for the first time of a session.

Screenshot of a 403/Forbidden error message (this should only happen if your IP was banned by Google):

Innertube API Forbidden

Endpoints changed:

  • channels (continuations improved, see above)
  • playlists
  • searches (for YouTube)
  • videos
  • mixes
  • comments

TO DO:

  • fix the fetch of the JS player at each time the extractor falls back to the desktop version (it should be normally cached).
  • add a better spoofing of the mobile API (by analyzing the requests made by the Android client, which uses protobuf) It should be made in a separate PR.
  • add a method to reset debofuscationCode, playerCode, playerJsUrl and signatureTimestamp strings.
  • update mocks and client version when PR is approved.
  • readd the deleted code for views because it breaks the number of views in livestreams.

I also reformatted some code to be in the 100 characters line limit and used final where possible, in the files that I changed.

This will close #568 (even if that I still use the desktop version of the new internal API instead of the mobile version, excepted for the videos when I use to the Android API if a video is protected by signatureCiphers).
Thanks to @FireMasterK for his findings.

APK for testing

See AudricV/NewPipe#1 for an up to date debug APK.

(Sorry in advance for my English.)

@AudricV AudricV added enhancement New feature or request youtube service, https://www.youtube.com/ labels Apr 9, 2021
@AudricV

This comment has been minimized.

@FireMasterK

This comment has been minimized.

@AudricV

This comment has been minimized.

@XiangRongLin
Copy link
Collaborator

XiangRongLin commented Apr 9, 2021

@TiA4f8R Can you merge in the change more incrementally instead of all at once. Basically finish up the current channels, search, playlist extractor and get those merged or even split those up into seperate PR.

@opusforlife2
Copy link
Collaborator

Then later, all the related PRs for this change can be linked to each other in the OP in a bullet list to make it easier to navigate for future devs.

@AudricV AudricV force-pushed the youtubei-api branch 4 times, most recently from 0e69831 to 8022275 Compare April 11, 2021 14:26
@AudricV AudricV changed the title Use the youtubei API in NewPipe Extractor Use the youtubei API in NewPipe Extractor for channels, playlists and searches Apr 11, 2021
@AudricV AudricV marked this pull request as ready for review April 11, 2021 16:22
@AudricV AudricV marked this pull request as draft April 15, 2021 10:31
@AudricV AudricV force-pushed the youtubei-api branch 3 times, most recently from b61d54f to 59e20e0 Compare April 19, 2021 17:16
@AudricV AudricV marked this pull request as ready for review April 19, 2021 17:27
@AudricV AudricV force-pushed the youtubei-api branch 2 times, most recently from 3eb2e9a to d607fe9 Compare April 21, 2021 17:48
@AudricV
Copy link
Member Author

AudricV commented Apr 21, 2021

@XiangRongLin I think I fixed all tests because the CI passed. What do you think?

@XiangRongLin
Copy link
Collaborator

From the mock standpoint, if the tests are passing, then it should be fine.

@AudricV AudricV force-pushed the youtubei-api branch 3 times, most recently from 47dd511 to f7bca36 Compare April 29, 2021 10:28
AudricV and others added 15 commits August 1, 2021 12:39
…deos + update clients version

Here is now the requests which will be made by the `onFetchPage` method of `YoutubeStreamExtractor`:

- the desktop API is fetched.

If there is no streaming data, the desktop player API with the embed client screen will be fetched (and also the player code), then the Android mobile API.
- if there is no streaming data, a `ContentNotAvailableException` will be thrown by using the message provided in playability status

If the video is age restricted, a request to the next endpoint of the desktop player with the embed client screen will be sent.
Otherwise, the next endpoint will be fetched normally, if the content is available.

If the video is not age-restricted, a request to the player endpoint of the Android mobile API will be made.

We can get more streams by using the Android mobile API but some streams may be not available on this API, so the streaming data of the Android mobile API will be first used to get itags and then the streaming data of the desktop internal API will be used.
If the parsing of the Android mobile API went wrong, only the streams of the desktop API will be used.

Other code changes:

- `prepareJsonBuilder` in `YoutubeParsingHelper` was renamed to `prepareDesktopJsonBuilder`
- `prepareMobileJsonBuilder` in `YoutubeParsingHelper` was renamed to `prepareAndroidMobileJsonBuilder`
- two new methods in `YoutubeParsingHelper` were added: `prepareDesktopEmbedVideoJsonBuilder` and `prepareAndroidMobileEmbedVideoJsonBuilder`
- `createPlayerBodyWithSts` is now public and was moved to `YoutubeParsingHelper`
- a new method in `YoutubeJavaScriptExtractor` was added: `resetJavaScriptCode`, which was needed for the method `resetDebofuscationCode` of `YoutubeStreamExtractor`
- `areHardcodedClientVersionAndKeyValid` in `YoutubeParsingHelper` returns now a `boolean` instead of an `Optional<Boolean>`
- the `fetchVideoInfoPage` method of `YoutubeStreamExtractor` was removed because YouTube returns now 404 for every client with the `get_video_info` page
- some unused objects and some warnings in `YoutubeStreamExtractor` were removed and fixed

Co-authored-by: TiA4f8R <[email protected]>
Migrate YouTube comments to the desktop version by using the `next` endpoint of the InnerTube internal API.
With the desktop version, we are able to get the exact like count of YouTube comments (by parsing the accessibility data) (the current extraction is used as a fallback). We are also now able to get if the uploader of the comment is verified or not.

Co-authored-by: TiA4f8R <[email protected]>
…ions of YouTube Music search results

The clickTrackingParams of YouTube Music search results are not needed to get continuations. This commit removes their use, which may improve privacy.
…ctorTest

Without removing RunWith and SuiteClasses annotations (and the corresponding imports) in YoutubePlaylistExtractorTest and YoutubeMixPlaylistExtractorTest, some mocks cannot be generated, so the CI fails because of the missing mocks. Mocks of workings tests have been also updated.
@AudricV AudricV dismissed B0pol’s stale review August 2, 2021 10:30

Outdated review

@AudricV AudricV requested a review from Stypox August 2, 2021 10:30
Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, finally! I'm gonna test some things and if I see nothing wrong I'll merge this and open a PR for the hotfix. Are you all ok with this?

final byte[] body = JsonWriter.string(prepareDesktopJsonBuilder(localization,
getExtractorContentCountry())
.value("browseId", "VL" + getId())
.value("params", "wgYCCAA%3D") // Show unavailable videos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about this? What is the purpose of showing unavailable videos in playlists? YouTube does not show them normally, and in NewPipe they would just create problems. Anyway, we'll think about this later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before a YouTube update, they were shown every time, so I thought it may be useful for some users (premieres, temporary georestrictions (think to music releases), ...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a button on YouTube to show them manually if you were logged in iirc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a technical side, this should be just base64 URL-encoded protobuf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to be logged in to show them manually: it also works if you are a guest.

@Stypox
Copy link
Member

Stypox commented Aug 3, 2021

Ok, I tested as much as I could all of the features I could think of. It works well. Thank you @TiA4f8R and @FireMasterK for your hard work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request youtube service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Findings] Non rate limited YouTube Mobile API
9 participants