-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entire Zooniverse web site blocked from search engines #6331
Comments
Adding |
The https://fe-project.zooniverse.org/robots.txt returns 404, and that domain can be indexed by Google. The Readme is available at https://fe-project.zooniverse.org/projects/assets/README.md and https://www.zooniverse.org/projects/assets/README.md so the |
@snblickhan https://frontend.preview.zooniverse.org/robots.txt blocks search crawlers, but I think it didn’t exist prior to last week. |
#6340 removes |
@eatyourgreens do you have a suggestion on how to solve the scenarios you described? We do not want www.zooniverse.org/robots.txt, but we do want frontend.preview.zooniverse.org/robots.txt. How to make that happen? |
Can you selectively add that file to the staging deploy, but not to the production deploy? I haven't thought about this for a long time, but I believe that staging and production use different Docker images. |
A very quick search of the app router docs found the answer. https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots |
I don't see any mention in the docs link about selective deployment. Unless you're considering using a Was there a different question you're looking for the answer to? |
I hacked something together very quickly in #6341. It seems to work. That's probably as much as I can do for free. Copilot is great for small jobs like this. It's free for anyone with a |
For pages on
DuckDuckGo seems to be better about hiding the staging domain in search results, but has also indexed it: |
You can also add <meta name="robots" content="noindex nofollow"> to pages that you don't want to be indexed eg. staged projects. |
Via discussion with @zwolf we decided to use the following strategy:
See zooniverse/static#380 and #6340 |
Looks like static.zooniverse.org is already in the Google index, including some data CSVs. https://www.google.com/gasearch?q=site:static.zooniverse.org Including the two-page guide to running a Zooniverse project. |
Fixed by zooniverse/static#380 and Google is re-crawling. |
Describe the bug
Since launching the new home page, the
www.zooniverse.org
domain is blocked from appearing in search engines.To Reproduce
https://www.zooniverse.org/robots.txt disallows indexing of all URLs on the Zooniverse domain.
You can reproduce the problem by searching for Eclipsing Binary Patrol in DuckDuckGo.
https://duckduckgo.com/?q=eclipsing+binary+patrol&t=iphone&ia=web
Google doesn't seem to be affected.
https://www.google.co.uk/search?q=eclipsing+binary+patrol
Expected behavior
Search engines should be allowed to index Zooniverse, so that people can find new projects.
Additional context
robots.txt
file that blocks deploys from being indexed.The text was updated successfully, but these errors were encountered: