URLHeadBear should use robots.txt #1782

jayvdb · 2017-05-27T11:53:52Z

https://en.wikipedia.org/wiki/Robots_exclusion_standard

Any request that isnt allowed by robots.txt should be reported as such.

meetmangukiya · 2017-05-27T11:55:52Z

(participating)

refeed · 2017-05-31T17:07:50Z

We can use https://docs.python.org/2/library/robotparser.html for this, and with some enhancement like cache robots.txt for each site, etc..

Requests that are not allowed by robots.txt are reported. Closes coala#1782

jayvdb added difficulty/medium importance/medium labels May 27, 2017

jayvdb mentioned this issue May 27, 2017

InvalidLinkBear: False results on working links #1780

Closed

gitmate-bot added the status/STALE label Sep 1, 2017

Makman2 removed the status/STALE label Mar 30, 2018

jayvdb changed the title ~~InvalidLinkBear should use robots.txt~~ URLHeadBear should use robots.txt Apr 7, 2018

li-boxuan assigned PrajwalM2212 Mar 16, 2019

PrajwalM2212 added a commit to PrajwalM2212/coala-bears that referenced this issue Mar 16, 2019

URLHeadBear.py: Use robots.txt

be132e1

Requests that are not allowed by robots.txt are reported. Closes coala#1782

PrajwalM2212 linked a pull request Mar 16, 2019 that will close this issue

URLHeadBear.py: Use robots.txt #2891

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URLHeadBear should use robots.txt #1782

URLHeadBear should use robots.txt #1782

jayvdb commented May 27, 2017

meetmangukiya commented May 27, 2017

refeed commented May 31, 2017

URLHeadBear should use robots.txt #1782

URLHeadBear should use robots.txt #1782

Comments

jayvdb commented May 27, 2017

meetmangukiya commented May 27, 2017

refeed commented May 31, 2017