Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWA 'pages' should set meta robots indexing directives #300

Closed
jonoalderson opened this issue Jun 29, 2020 · 13 comments · Fixed by #330
Closed

PWA 'pages' should set meta robots indexing directives #300

jonoalderson opened this issue Jun 29, 2020 · 13 comments · Fixed by #330

Comments

@jonoalderson
Copy link
Collaborator

Requests which return service worker 'pages' or content - like example.com/?wp_service_worker=1, example.com/?wp_error_template=offline and example.com/?wp_error_template=500 (and any other/similar responses) - should indicate that they shouldn't be indexed by search engines or social media.

This can be achieved by returning an x-robots-tag HTTP header with such responses, with a value of noindex, follow.

@westonruter
Copy link
Collaborator

Good point. Given the current codebase, this can be done via:

--- a/wp-includes/general-template.php
+++ b/wp-includes/general-template.php
@@ -111,6 +111,7 @@ function wp_add_error_template_no_robots() {
 function wp_unauthenticate_error_template_requests() {
 	if ( is_offline() || is_500() ) {
 		wp_set_current_user( 0 );
+		add_action( 'wp_head', 'wp_no_robots' );
 	}
 }

However, we may want to rename wp_unauthenticate_error_template_requests() to be more appropriate.

@westonruter westonruter added this to the 0.6 milestone Jun 29, 2020
@westonruter
Copy link
Collaborator

Hold on. Is this not already being done? See this code:

/**
* Add no-robots meta tag to error template.
*
* @todo Is this right? Should we add_action when we find out that the filter is present?
* @see wp_no_robots()
* @since 0.2
*/
function wp_add_error_template_no_robots() {
if ( is_offline() || is_500() ) {
wp_no_robots();
}
}

@westonruter
Copy link
Collaborator

Is not the meta tag equivalent to an X-Robots-Tag HTTP response header?

@westonruter westonruter removed this from the 0.6 milestone Jul 5, 2020
@jonoalderson
Copy link
Collaborator Author

jonoalderson commented Jul 6, 2020

Yep, that's indeed equivalent. Good catch; I was checking HTTP headers but neglected to check the HTML source... Oops!

@jonoalderson
Copy link
Collaborator Author

jonoalderson commented Jul 6, 2020

Just looks like we're missing the x-robots-tag output on ?wp_service_worker=1, then.

@westonruter
Copy link
Collaborator

Why is it needed for the service worker script? Other scripts served by WordPress don't have that header (e.g. jQuery).

@jonoalderson
Copy link
Collaborator Author

(most) other scripts are obviously scripts (e.g., ending in .js), and, exist as real files (which we can't intercept via WordPress),

@westonruter
Copy link
Collaborator

Right, but is it necessary? Do any search engines index JS assets?

@jonoalderson
Copy link
Collaborator Author

They may do, if they don't realise that they're JS assets.

This isn't the only use-case, too. Anecdotally, I'm currently dabbling with setting various HTTP headers on different templates/pages (mostly for preloading/pushing resources), and there's no 'clean' way to exclude service worker results pages.

People are going to continually run into this kind of problem, whilst we have a dynamically generated/served JavaScript file which presents as a page. The situation isn't radically different from is_robots(). Some simple wrappers for conditional logic here will make managing this much easier.

@westonruter
Copy link
Collaborator

Sorry for the long pause.

They may do, if they don't realise that they're JS assets.

Note that in #289 the URL for service workers will be /wp.serviceworker as opposed to /?wp_service_worker=1. So there will be less potential confusion.

In any case, the Content-Type is text/javascript. If they can't realize that it is a JS asset from this, then I'd consider that to be a horrible search engine 😄

In any case, I've opened a PR to add this response header: #330.

@westonruter westonruter added this to the 0.6 milestone Sep 7, 2020
@jonoalderson
Copy link
Collaborator Author

Well, undeclared file types also leave other systems blind, so it's best practice for a 'non-page' asset to declare its type in the URL. Cloudflare, for example, only caches (by default) assets based on their file extension; regardless of the Content-Type header.

@westonruter
Copy link
Collaborator

westonruter commented Sep 7, 2020

It's not feasible to for a WP-generated file to end in .js unfortunately given that many server configurations will send the request straight to the filesystem. See discussion at #287 (comment).

@jonoalderson
Copy link
Collaborator Author

Yeah, I caught that. I really like .serviceworker as a graceful workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants