Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with shortened URLs #45

Open
lukasIO opened this issue Apr 14, 2017 · 3 comments
Open

How to deal with shortened URLs #45

lukasIO opened this issue Apr 14, 2017 · 3 comments
Assignees
Labels

Comments

@lukasIO
Copy link

lukasIO commented Apr 14, 2017

Hi,

is there a way to retrieve the landing url of a shortened url like goo.gl/89234fIASVHAS ?
Right now the crawler will pass the shortened url into the callback, which messes up all relative links on the crawled pages...
Thanks!

@amoilanen amoilanen self-assigned this Apr 16, 2017
@amoilanen
Copy link
Owner

From a quick look it seems like bit.ly uses the status 301 "Moved permanently" and goo.gl 307 "Internal redirect" will need to investigate the case of URL shorteners a bit more.

@lukasIO
Copy link
Author

lukasIO commented Apr 26, 2017

thanks for your reply, do you have any advice on how to work around it for now?

@amoilanen
Copy link
Owner

Right now the crawler will pass the shortened url into the callback

I fixed this part, added a unit test and published a new version of the crawler 0.3.19

However I could not reproduce the original issue when passing the first url into the onSuccess callback would cause problems with relative urls:

which messes up all relative links on the crawled pages...

Please, let me know if the problem has been fixed with the recent changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants