Skip to content

CRAN v0.5.2

Compare
Choose a tag to compare
@petermeissner petermeissner released this 21 Nov 05:22
· 154 commits to main since this release

0.5.2 | 2017-11-12

  • fix : rt_get_rtxt() would break on Windows due trying to readLines() from folder

0.5.1 | 2017-11-11

  • change : spiderbar is now non-default second (experimental) check method
  • fix : there were warnings in case of multiple domain guessing

0.5.0 | 2017-10-07

  • feature : spiderbar's can_fetch() was added, now one can choose which check method to use for checking access rights
  • feature : use futures (from package future) to speed up retrieval and parsing
  • feature : now there is a get_robotstxts() function wich is a 'vectorized' version of get_robotstxt()
  • feature : paths_allowed() now allows checking via either robotstxt parsed robots.txt files or via functionality provided by the spiderbar package (the latter should be faster by approximatly factor 10)
  • feature : various functions now have a ssl_verifypeer option (analog to CURL option https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html) which might help with robots.txt file retrieval in some cases
  • change : user_agent for robots.txt file retrieval will now default to: sessionInfo()$R.version$version.string
  • change : robotstxt now assumes it knows how to parse --> if it cannot parse it assumes that it got no valid robots.txt file meaning that there are no restrictions
  • fix : valid_robotstxt would not accept some actual valid robotstxt files