Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages with noindex meta still showing in sitemap #6247

Closed
3 of 7 tasks
Josh-Cena opened this issue Jan 2, 2022 · 5 comments · Fixed by #7143
Closed
3 of 7 tasks

Pages with noindex meta still showing in sitemap #6247

Josh-Cena opened this issue Jan 2, 2022 · 5 comments · Fixed by #7143
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@Josh-Cena
Copy link
Collaborator

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

It is bad practice to add noindex pages to sitemap:

Steps to reproduce

  1. Add <meta name="robots" content="noindex"> to the head tag of a page.
  2. Build.

Expected behavior

The sitemap doesn't include that page.

Actual behavior

It does—we don't filter routes by noindex.

We should definitely conform to the noIndex config option and not output a sitemap in that case. For individual pages, we should probably read the HTML files and filter those with noindex? Or should we ask the user to provide a list of routes to ignore? (I think we should have both)

Your environment

No response

Reproducible demo

No response

Self-service

  • I'd be willing to fix this bug myself.
@Josh-Cena Josh-Cena added the bug An error in the Docusaurus core causing instability or issues with its execution label Jan 2, 2022
@Josh-Cena Josh-Cena added the status: blocked This issue is blocked by another issue or external dep and can't be pushed further. label Jan 3, 2022
@slorber
Copy link
Collaborator

slorber commented Jan 5, 2022

For individual pages, we should probably read the HTML files and filter those with noindex?

I don't like this idea a lot 😅

Although we might also need to read HTML files for the RSS feed content with MDX, see #5664 (comment) (some code could be shared, need to take care of HTML output file patterns etc)


Another possibility is to do something similar to the broken link checker: add some extra logic to the <Head> component and notify the parent SSR renderer when a page is using a noIndex meta: we'd get a list of noIndes pathnames after Webpack compilation that we could pass to the sitemaps plugin.

This could be generic, good enough and transparent for the user, no extra API needed?

@Josh-Cena
Copy link
Collaborator Author

That sounds good as well! 👍

@deployn
Copy link
Contributor

deployn commented Mar 27, 2022

Hello, I'm still encountering this problem, unfortunately. Is it currently somehow possible to exclude certain pages from the sitemap? I would like to not have to edit them manually every time after the build process.

@Josh-Cena
Copy link
Collaborator Author

Josh-Cena commented Mar 28, 2022

@deployn We'll soon have #6979 which will be what you want. For now, there's probably no (trivial) way to do it.

@deployn
Copy link
Contributor

deployn commented Mar 28, 2022

@deployn We'll soon have #6979 which will be what you want. For now, there's probably no (trivial) way to do it.

Thank you, I didn't see that pr.

@Josh-Cena Josh-Cena removed the status: blocked This issue is blocked by another issue or external dep and can't be pushed further. label Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants