Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLHeadBear should use robots.txt #1782

Open
jayvdb opened this issue May 27, 2017 · 2 comments · May be fixed by #2891
Open

URLHeadBear should use robots.txt #1782

jayvdb opened this issue May 27, 2017 · 2 comments · May be fixed by #2891

Comments

@jayvdb
Copy link
Member

jayvdb commented May 27, 2017

https://en.wikipedia.org/wiki/Robots_exclusion_standard

Any request that isnt allowed by robots.txt should be reported as such.

@meetmangukiya
Copy link
Member

(participating)

@refeed
Copy link
Member

refeed commented May 31, 2017

We can use https://docs.python.org/2/library/robotparser.html for this, and with some enhancement like cache robots.txt for each site, etc..

@jayvdb jayvdb changed the title InvalidLinkBear should use robots.txt URLHeadBear should use robots.txt Apr 7, 2018
PrajwalM2212 added a commit to PrajwalM2212/coala-bears that referenced this issue Mar 16, 2019
Requests that are not allowed by robots.txt
are reported.

Closes coala#1782
@PrajwalM2212 PrajwalM2212 linked a pull request Mar 16, 2019 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants