Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robots.txt generator: disallowing specific pages instead of whole sections #2876

Closed
svbeek-ida opened this issue Jun 24, 2022 · 1 comment
Closed

Comments

@svbeek-ida
Copy link

svbeek-ida commented Jun 24, 2022

I'd like to request a new feature in the ACS commons sitemap generator.

Currently, when adding the disallowed property on a page, the line thats beeing added in robots.txt looks like this:

Disallow: https://www.mysite.com/somepage/

Basicly, this will disallow the whole /somepage/ section. However somepage.html does not match this path, and will be indexed. See the robots.txt spec for details: https://developers.google.com/search/docs/advanced/robots/robots_txt#disallow

Shouldnt it be possible to also disallow somepage.html, or maybe make it configurable wether the single page or whole section should be disallowed?

Removing the trailing slash is not an option, since it would match any page starting with /somepage*

pahupe added a commit to pahupe/acs-aem-commons that referenced this issue Jul 22, 2022
Two new OSGi config properties have been introduced:
- allow.page.property.names and
- disallow.page.property.names
to generate ALLOW and DISALLOW rules for single pages (path/page.html)

These can be combined with existing properties
- allow.property.names and
- disallow.property.names
to ALLOW single pages (path/page.html) while DISALLOWing nested paths (path/page/), or vice versa
davidjgonzalez pushed a commit that referenced this issue Aug 22, 2022
* Improvement for #2876

Two new OSGi config properties have been introduced:
- allow.page.property.names and
- disallow.page.property.names
to generate ALLOW and DISALLOW rules for single pages (path/page.html)

These can be combined with existing properties
- allow.property.names and
- disallow.property.names
to ALLOW single pages (path/page.html) while DISALLOWing nested paths (path/page/), or vice versa
@kwin kwin closed this as completed Sep 8, 2022
@kwin
Copy link
Contributor

kwin commented Sep 8, 2022

Fixed in v5.3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants