You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while integrating spiderbar's can_fetch() into the robotstxt package I encountered a test case where can_fetch() and paths_allowed(check_method="robotstxt") differ.
I discussed this here: seomoz/rep-cpp#33 ... and I think the robotstxt method for checking makes plausible but simply wrong assumptions about how robotstxt files work.
I will fix this within the robotstxt package and from then on default to the much faster spiderbar/rep-cpp-backend for simple path checking.
Hey,
while integrating spiderbar's
can_fetch()
into the robotstxt package I encountered a test case wherecan_fetch()
andpaths_allowed(check_method="robotstxt")
differ.Consider the following robots.txt file:
User-agent: UniversalRobot/1.0 User-agent: mein-Robot Disallow: /quellen/dtd/ User-agent: * Disallow: /unsinn/ Disallow: /temp/ Disallow: /newsticker.shtml
Now try this:
can_fetch()
seems to ignore those rules that are ought to apply to all bots if a specific bot name / user agent is used.The text was updated successfully, but these errors were encountered: