Entire Zooniverse web site blocked from search engines #6331

eatyourgreens · 2024-09-22T04:49:29Z

Describe the bug

Since launching the new home page, the www.zooniverse.org domain is blocked from appearing in search engines.

Search results for Eclipsing Binary Patrol. The Zooniverse entry is a message about being blocked from showing a result.

To Reproduce

https://www.zooniverse.org/robots.txt disallows indexing of all URLs on the Zooniverse domain.

You can reproduce the problem by searching for Eclipsing Binary Patrol in DuckDuckGo.
https://duckduckgo.com/?q=eclipsing+binary+patrol&t=iphone&ia=web

Google doesn't seem to be affected.
https://www.google.co.uk/search?q=eclipsing+binary+patrol

Expected behavior

Search engines should be allowed to index Zooniverse, so that people can find new projects.

Additional context

app-root: Dockerize app-root #6010 added a robots.txt file that blocks deploys from being indexed.
you can use Google Page Speed Insights to check SEO errors: https://pagespeed.web.dev/analysis/https-www-zooniverse-org/zek3cq1vmw

The text was updated successfully, but these errors were encountered:

eatyourgreens · 2024-09-23T07:20:53Z

Google shows links to Zooniverse. Like DuckDuckGo, the description is blocked.

Galaxy Zoo linked from a Google search for 'galaxy zoo website.' The listing says that Google is blocked from showing information about the page.

lcjohnso · 2024-09-23T16:35:19Z

Adding robots.txt originated from wanting to prevent indexing of staging pages (see #2541). Consistent with previous homepage behavior, the restriction should be removed for the main Zooniverse domain.

eatyourgreens · 2024-09-23T17:17:54Z

Adding robots.txt originated from wanting to prevent indexing of staging pages (see #2541). Consistent with previous homepage behavior, the restriction should be removed for the main Zooniverse domain.

The robots.txt files I added in #2541 don't actually work, as I didn't publish them to their respective domain roots. 😞

https://fe-project.zooniverse.org/robots.txt returns 404, and that domain can be indexed by Google.

The Readme is available at https://fe-project.zooniverse.org/projects/assets/README.md and https://www.zooniverse.org/projects/assets/README.md so the public directory is being published, as expected. 🤔

eatyourgreens · 2024-09-24T08:49:11Z

I've opened #6335 to publish /robots.txt for the standalone projects app. My mistake in #2541 was that I published the robots file at /projects/robots.txt.

snblickhan · 2024-09-25T17:50:58Z

A report from Justin Schell re: Mapping Prejudice (project ID: 3877).

Workflow 25524 is appearing in Google search results via an FEM link (frontend.preview), but MP is a PFE project. This has led to one case where someone found this link by searching "washtenaw Zooniverse" and inadvertently submitted FEM classifications to a PFE project, not realizing there was any difference. See screenshot below:

eatyourgreens · 2024-09-25T18:15:45Z

@snblickhan https://frontend.preview.zooniverse.org/robots.txt blocks search crawlers, but I think it didn’t exist prior to last week.

eatyourgreens · 2024-09-25T18:20:31Z

#6340 removes /robots.txt from staging too, so frontend.preview would be crawlable.

goplayoutside3 · 2024-09-25T19:43:51Z

@eatyourgreens do you have a suggestion on how to solve the scenarios you described? We do not want www.zooniverse.org/robots.txt, but we do want frontend.preview.zooniverse.org/robots.txt. How to make that happen?

eatyourgreens · 2024-09-25T19:54:07Z

Can you selectively add that file to the staging deploy, but not to the production deploy? I haven't thought about this for a long time, but I believe that staging and production use different Docker images.

eatyourgreens · 2024-09-25T20:36:26Z

A very quick search of the app router docs found the answer.

https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots

goplayoutside3 · 2024-09-25T20:47:42Z

the answer

I don't see any mention in the docs link about selective deployment. Unless you're considering using a robots.js file to look for certain env variables 🤔

Was there a different question you're looking for the answer to?

eatyourgreens · 2024-09-26T09:22:34Z

I hacked something together very quickly in #6341. It seems to work. That's probably as much as I can do for free.

Copilot is great for small jobs like this. It's free for anyone with a .edu or .ac.uk email address.
https://education.github.com/discount_requests/application

eatyourgreens · 2024-09-26T09:31:14Z

For pages on frontend.preview.zooniverse.org that have already been indexed by Google, you might need to go into Google Search Console (as owners of that subdomain) and explicitly ask Google to remove the frontend.preview subdomain from the search index.

https://www.google.co.uk/search?q=washtenaw+zooniverse
https://www.google.co.uk/search?q=zooniverse+alice+sandbox
https://www.google.co.uk/search?q=site%3Afrontend.preview.zooniverse.org (project pages start to appear somewhere around pages 7 or 8 of this search.)

DuckDuckGo seems to be better about hiding the staging domain in search results, but has also indexed it:

eatyourgreens · 2024-09-26T14:45:06Z

You can also add

<meta name="robots" content="noindex nofollow">

to pages that you don't want to be indexed eg. staged projects.
https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag

goplayoutside3 · 2024-09-27T17:40:57Z

Via discussion with @zwolf we decided to use the following strategy:

fem apps all serve the restrictive version of /robots.txt
host the permissive one in blob storage
proxy explicit requests to www.zooniverse.org/robots.txt to the permissive version

See zooniverse/static#380 and #6340

eatyourgreens · 2024-09-27T18:27:13Z

Looks like static.zooniverse.org is already in the Google index, including some data CSVs.

https://www.google.com/gasearch?q=site:static.zooniverse.org

Including the two-page guide to running a Zooniverse project.
https://static.zooniverse.org/www.citizensciencealliance.org/downloads/zooniverse_guide.pdf

goplayoutside3 · 2024-10-03T17:28:40Z

Fixed by zooniverse/static#380 and Google is re-crawling.

eatyourgreens added the bug Something isn't working label Sep 22, 2024

eatyourgreens mentioned this issue Sep 23, 2024

fix(app-project): publish /robots.txt for the projects app #6335

Merged

12 tasks

goplayoutside3 mentioned this issue Sep 25, 2024

app-root: Remove stub routes #6340

Merged

4 tasks

eatyourgreens mentioned this issue Sep 26, 2024

fix(app-root): allow search engines in production #6341

Closed

12 tasks

goplayoutside3 closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entire Zooniverse web site blocked from search engines #6331

Entire Zooniverse web site blocked from search engines #6331

eatyourgreens commented Sep 22, 2024 •

edited

Loading

eatyourgreens commented Sep 23, 2024

lcjohnso commented Sep 23, 2024

eatyourgreens commented Sep 23, 2024 •

edited

Loading

eatyourgreens commented Sep 24, 2024 •

edited

Loading

snblickhan commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024 •

edited

Loading

goplayoutside3 commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

goplayoutside3 commented Sep 25, 2024

eatyourgreens commented Sep 26, 2024 •

edited

Loading

eatyourgreens commented Sep 26, 2024 •

edited

Loading

eatyourgreens commented Sep 26, 2024

goplayoutside3 commented Sep 27, 2024

eatyourgreens commented Sep 27, 2024 •

edited

Loading

goplayoutside3 commented Oct 3, 2024

Entire Zooniverse web site blocked from search engines #6331

Entire Zooniverse web site blocked from search engines #6331

Comments

eatyourgreens commented Sep 22, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

eatyourgreens commented Sep 23, 2024

lcjohnso commented Sep 23, 2024

eatyourgreens commented Sep 23, 2024 • edited Loading

eatyourgreens commented Sep 24, 2024 • edited Loading

snblickhan commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024 • edited Loading

goplayoutside3 commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

eatyourgreens commented Sep 25, 2024

goplayoutside3 commented Sep 25, 2024

eatyourgreens commented Sep 26, 2024 • edited Loading

eatyourgreens commented Sep 26, 2024 • edited Loading

eatyourgreens commented Sep 26, 2024

goplayoutside3 commented Sep 27, 2024

eatyourgreens commented Sep 27, 2024 • edited Loading

goplayoutside3 commented Oct 3, 2024

eatyourgreens commented Sep 22, 2024 •

edited

Loading

eatyourgreens commented Sep 23, 2024 •

edited

Loading

eatyourgreens commented Sep 24, 2024 •

edited

Loading

eatyourgreens commented Sep 25, 2024 •

edited

Loading

eatyourgreens commented Sep 26, 2024 •

edited

Loading

eatyourgreens commented Sep 26, 2024 •

edited

Loading

eatyourgreens commented Sep 27, 2024 •

edited

Loading