-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respecting robots.txt files #1496
Comments
Hi, as a feed reader, selfoss does not crawl the web – it only periodically fetches URLs of the feeds that user provides. As such, I would say following So if selfoss is hitting a page, it most likely means a user configured it to do so. There is also a chance that user specified your homepage as the source URL and, since it is not a feed, SimplePie library’s smart feed discovery picks a special link from the page for some reason. Feel free to send me the logs to [email protected], I can take a look. |
Ah, I see the pattern. It is one person doing these two requests at 30 minute or hourly intervals repeating.
I mistook it for crawling. That's fine at that scale. If your program gets really popular, I'll come back to request a robots.txt file feature. Thanks for getting back to me! Closing the issue. |
I looked and I did not see anything about robots.txt files in the issues.
I see web traffic on one of the servers I manage claiming to be a selfoss instance which is
scrapingrequesting/wiki/Special:
pages. Our robots.txt file explicitly disallows robots from scraping those pages.Is this an issue with selfoss or is this not a selfoss instance?
I would be happy to supply some redacted logs over email if it would help.
Edit: Replacing scraping with requesting.
The text was updated successfully, but these errors were encountered: