Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400 Bad Response #48

Open
aclevername opened this issue Dec 17, 2020 · 5 comments
Open

400 Bad Response #48

aclevername opened this issue Dec 17, 2020 · 5 comments

Comments

@aclevername
Copy link

Hello!

Our team has started to notice failures when checking links on twitter.com. E.g.:

        ERROR   https://twitter.com/ashleymcnamara
                Bad Request (HTTP error 400)

When I access the above link in my browser it appears to be fine. Doing some research this might be related to legacy twitter being removed https://screenrant.com/twitter-legacy-nintendo-3ds-shut-down-date-december-2020/

Thanks,
Jake & @Callisto13

@MichaIng
Copy link

Same here. It's a Twitter internal DDoS protection or similar. Other hosts cause similar issues: #42

It is a natural issue when checking many links concurrently and/or shortly after another to the same host. The only solution I can think of is to allow defining not only the number of concurrent connections but also a maximum number of connections to the same host in a certain time range, so that liche waits for the timeout before continuing to check link with that host. But this is quite a larger task, I guess, especially since it needs to be configured on a per-host level to match the individual limits.

@aclevername
Copy link
Author

Same here. It's a Twitter internal DDoS protection or similar. Other hosts cause similar issues: #42

It is a natural issue when checking many links concurrently and/or shortly after another to the same host. The only solution I can think of is to allow defining not only the number of concurrent connections but also a maximum number of connections to the same host in a certain time range, so that liche waits for the timeout before continuing to check link with that host. But this is quite a larger task, I guess, especially since it needs to be configured on a per-host level to match the individual limits.

I'm not sure if its related to that issue, I think its more likely caused by this https://news.ycombinator.com/item?id=25464280

@MichaIng
Copy link

Ah that is true. I assumed to quickly since Twitter was one of the hosts that often show up alone in list of failed links, but with a different error response indeed. Okay that 400 is new and can be replicated via curl --head https://twitter.com and curl https://twitter.com. Then it needs to be excluded for now.

@raviqqe
Copy link
Owner

raviqqe commented Dec 19, 2020

Sorry, I decided to deprecate this project! Please consider using the alternatives listed there. Thank you all for your cooperation!

https://github.com/raviqqe/liche#deprecation-notice

@MichaIng
Copy link

Sad to hear. Nobody (okay I can only speak for myself) is expecting a tool to automatically handle all kind of website rate limitations, other anomalies etc. E.g. while it is indeed nice to have even GitHub token support in lychee, I would never add it to my own tool (if I would develop one) as finally users or myself would expect to implement individual handlers and options for every larger website, total insane IMO. All that is needed to address the issues/reasons you mention is some flexibility to customise which response code is handled how. But probably that is more work than I imagine and of course I respect the decision and with all the best, started with some nice holidays. Thanks for your great work that helped us detect a lot of broken URLs by now 👍 😃.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants