Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App Engine URL being indexed in preference to real URL #884

Closed
tunetheweb opened this issue Jun 20, 2020 · 9 comments · Fixed by #888
Closed

App Engine URL being indexed in preference to real URL #884

tunetheweb opened this issue Jun 20, 2020 · 9 comments · Fixed by #888
Labels
bug Something isn't working development Building the Almanac tech stack SEO SEO related

Comments

@tunetheweb
Copy link
Member

As discussed in #810 (comment) some chapters seem to be being indexed under the App Engine URL despite the canonical meta tag pointing to the real one:

Google Search Results showing SEO French Chapter indexed under the wrong URL

Should we add a 301 redirect when we see the host is incorrect to try to give Google Search an extra nudge to look at the right place?

@tunetheweb tunetheweb added bug Something isn't working development Building the Almanac tech stack SEO SEO related labels Jun 20, 2020
@rviscomi
Copy link
Member

Can this be solved with canonical URLs?
https://support.google.com/webmasters/answer/139066?hl=en

@tunetheweb
Copy link
Member Author

Look at the source of that link: https://20200110t162143-dot-webalmanac.appspot.com/fr/2019/seo

It has the Canonical URL correctly set (and always has):

<link rel="canonical" href="https://almanac.httparchive.org/fr/2019/seo" />

Also testing the link in Google Search Console shows it’s correctly being picked up. But for some reason Google just doesn’t want to index it 😔

@tunetheweb
Copy link
Member Author

An alternative, which might be safer, and also allow us to continue to use test URLs on the odd time we need them, would be to just add a <meta name="robots" content="noindex"> to the head if not the main domain rather than redirect.

@tunetheweb tunetheweb changed the title Redirect App Engine URL to real URL App Engine URL being indexed in preference to real URL Jun 20, 2020
@tunetheweb
Copy link
Member Author

@softplus
Copy link

We picked this URL up from crawling the rest of the appspot site. In general, the rel-canonical is the right approach here, but it's not guaranteed or immediate, especially when we don't crawl much from that part of the site. Using a redirect would probably speed that up a bit, but apart from the URL, it won't change anything for search. There's no need to force this kind of thing.

@tunetheweb
Copy link
Member Author

Thanks @softplus so best to just leave as is? And noindex isn’t necessary and won’t help?

It’s just it’s been over two months since we launched some of these chapters:
Google Search Console screenshot showing 0 error pages, 0 warning pages, 61 valid pages and 11 not indexed pages

Google is aware of them but stubbornly doesn't seem to want to index them:

Google Search Console screenshot with 8 pages discovered but not indexed and 3 pages not selected as canonical

@softplus
Copy link

We're not indexing the other version at the moment, and this version (the appspot one) isn't crawled that frequently, so a noindex wouldn't be useful. One way you could help to get the preferred version indexed is to make sure that all internal links also point at that version of the site. At the moment you're using relative links, so if a part of the appspot site is crawled, it's easy to just keep crawling there. Anyway, I don't think this would change anything with regards to that page in the search results, at most it would just be the other URL being shown (which might make it easier for tracking on your side, but that's about it). If it were a really large site, and especially if there were a lot of these duplicates, that would be a case where I'd suggest finding ways to simplify crawling & indexing by getting these URLs better-folded together; in this case, I don't think you're running into critical issues in that regard though.

It's really common that not all of a site is indexed, and "unfortunately" Search Console makes it easy to find these kinds of quirks :).

@tunetheweb
Copy link
Member Author

tunetheweb commented Jun 21, 2020

Yeah it’s the combination of not indexing, and then seeing other site indexing that was perplexing me.

And the irony of not being able to get the SEO chapter to correctly index is what’s especially galling! 😀 Though at least the originally English version is indexed.

Ok guess we’ll just have to wait and see. I’m not a fan of the full link as that would interfere with the development site. It would be a similar reason to avoid a 301 - sometimes we want another version of the site up there temporarily to compare with production. We could do either or both of these temporarily, to try to give it the extra push to maybe sort this out, but as you say there’s no need to force this, and no guarantee that would help so think I’ll not bother.

I’m still tempted to add the noindex meta tag when the appspot domain name Is used, as there are no downsides to that as far as I can see (except that the appspot one will no longer be indexed but I’m I actually seeing that as a positive to be honest). It may not help much with current situation but could perhaps avoid it in future. And the reality is we don’t want these versions indexed as confusing to users (and perhaps Google!). Especially as, as I say, we sometimes have development versions up that definitely should not be indexed.

@tunetheweb
Copy link
Member Author

Added the noindex meta tag and confirmed it only shows when the real domain isn't used. We monitor and see if that makes any diffence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working development Building the Almanac tech stack SEO SEO related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants