Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting missing links and 404s #492

Closed
asitemade4u opened this issue Apr 30, 2020 · 3 comments · Fixed by #594
Closed

Detecting missing links and 404s #492

asitemade4u opened this issue Apr 30, 2020 · 3 comments · Fixed by #594
Assignees
Labels
area/drivers HTML drivers good first issue Good for newcomers type/enhancement New feature or request
Milestone

Comments

@asitemade4u
Copy link

The developer of the website I intend to scrape information from is sloppy and has left a lot of broken links.
When I execute an otherwise effective Ferret script on a list of pages, it stops altogether at every 404.
Is there a DOCUMENT_EXISTS or anything that would help the script go on?

@asitemade4u asitemade4u changed the title Detecting 404 Detecting 404s Apr 30, 2020
@asitemade4u asitemade4u changed the title Detecting 404s Detecting missing links and 404s Apr 30, 2020
@ziflex ziflex added area/drivers HTML drivers good first issue Good for newcomers type/enhancement New feature or request labels May 2, 2020
@ziflex
Copy link
Member

ziflex commented May 2, 2020

Nope, there is no such a function. But we can come up with something like that.

@ksdme
Copy link

ksdme commented May 22, 2020

@ziflex I would like to pick this issue.

@ziflex
Copy link
Member

ziflex commented Feb 19, 2021

@asitemade4u could you provide some links Ferret is failing on?

404 are handled if a target website handles them too:

LET p = DOCUMENT('http://google.com/fdsfsd', { driver: "cdp" })

RETURN p.response.statusCode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/drivers HTML drivers good first issue Good for newcomers type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants