Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Opt-out of Channels due to Refresh Token Error #337

Open
zeeshanakram3 opened this issue Oct 18, 2024 · 1 comment
Open

Automatic Opt-out of Channels due to Refresh Token Error #337

zeeshanakram3 opened this issue Oct 18, 2024 · 1 comment

Comments

@zeeshanakram3
Copy link
Contributor

Context

Previously, we used yt-dlp (https://github.com/yt-dlp/yt-dlp) to download channel info, video lists, video metadata, and actual videos from all channels for syncing purposes. However, a few months ago, YouTube became more restrictive, implementing rate-limiting measures and eventually blocking the IP address of the machine due to excessive usage.

Changes Made

  • Switched to using the authenticated YouTube Data API (OAuth API) for fetching channel info and video metadata.
  • Reduced the number of concurrent downloads to be nearly sequential.

Current Issue

Due to the change to the authenticated YouTube Data API, the following check in the YoutubePollingService.ts automatically opted out more than 10,000 channels:
Code Reference

During investigation, the error returned from Google when fetching channel info is:

{ error: 'invalid_grant', error_description: 'Bad Request' }

Google OAuth documentation indicates that refresh tokens may become invalid if there is an inactivity period of more than 6 months (Reference). This could explain the issue, as we switched to using the YouTube API only recently, after relying on yt-dlp for over a year.

Impact

  • The above-mentioned check automatically opted out over 10,000 channels.
  • Many refresh tokens may have become invalid due to prolonged inactivity.
  • We lack previous state information (yppStatus field in channels table) in DynamoDB, but the required data is available in HubSpot (as each field is versioned there).

Temporary Fix

For now, the following code has been commented out to prevent further automatic opt-outs:
Code Reference

Potential Solutions

  1. Revert the Channel Status:

    • If we continue using the YouTube API for channel info, a re-authorization of the gleev app from users will be required, which may not be feasible at scale.
    • Alternatively, revert the yppStatus field of all the affected 10,000+ channels using the information stored in HubSpot. This would involve:
      • Writing a script to fetch the previous state of all affected channels from HubSpot.
      • Updating the state in DynamoDB accordingly.
  2. Return to yt-dlp for Data Retrieval:

    • Switch back to using yt-dlp for fetching channel info, video lists, etc., but address the IP blockage issue.
    • Potential solutions to IP blockage include rotating proxies or using a pool of IP addresses.

Next Steps

  • Decide on the approach: continue with the YouTube API and address re-authorization or switch back to yt-dlp.
  • Write a script to fetch previous channel states from HubSpot if we opt to revert statuses.
  • Plan for mitigating IP blockage if we return to yt-dlp.

References

@bedeho
Copy link
Member

bedeho commented Oct 18, 2024

Excellent breakdown, thank you @zeeshanakram3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants