-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid YouTube Captcha #12229
Avoid YouTube Captcha #12229
Conversation
I wonder if instead of hardcoding YouTube URLs we could long-term cache discovered OEmbed endpoints. One question is how to map discovered OEmbed endpoints back onto URL schemes like |
I have been thinking about that since yesterday and something that could do that would be to do the initial discovery of the oEmbed like it has been done and if found doing 2 regex matches (one mandatory, the other optional, when the mandatory is not matched continue the way it is right now). The matches would be something like: Mandatory: match According to the oEmbed Spec:
So, on an URL like: The cache would be an array like:
For an URL like: The cache would be an array like:
The cache could last 24 hours or something like that and when a new preview_card needs to be generated check if domain endpoint cache exist and just append the encoded URL to the endpoint (with format if present). |
OK, so I updated the code to do what I mentioned in the previous comment. It's the first time I attempt to do some sort of "serious" coding using Ruby and I am just using coding experience from other languages. So, I'm pretty sure it could be cleaner (safer?) but from my testing it works and it's pretty fast when the cache is used. Feel free to not use it or to copy/adapt/change if you think something is useful. |
Even with caching YouTube oEmbed links I still got the captcha on my servers. It worked for a couple of days but it ended up coming back. I have hardcoded YouTube oEmbed on my fork. |
Gradually during the past week some of my servers started running into an issue with YouTube where the preview_cards for videos were failing.
After doing some debugging I noticed that the HTML that is fetched on
fetch_link_card_service.rb
was returning a YouTube reCaptcha page instead of the video page. On this page there is nolink[@type="application/json+oembed"]
orlink[@type="text/xml+oembed"]
present so the link to oembed is never found.I think this is related to the number of requests my servers do to YouTube and what Google calls automated traffic.
This is a work around to bypass the problem. I know it's not ideal and it creates an exception/hardcode for YouTube but that was all I could think to solve the issue.
I am not a Rails coder and please do double check and suggest corrections to what I have done.
For both of these reasons I completely understand if this is never merged to master but decided to share this in case someone else runs into the same issue or to see if we can find a better solution.