-
-
Notifications
You must be signed in to change notification settings - Fork 934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tell search engines to not index individual versions #4107
Conversation
I've been frustrated by this too, excellent idea. |
It looks like there's a canonical tag in the
Personally, I'd just update the canonical tag to point to the non-versioned gem URL and continue to allow indexing (i.e. don't merge this PR). I'm not an SEO expert, but I think |
ahhhh - nice catch - i think that explains the horrible SEO that is unique to rubygems - an actual bug with the canonical link. will change the other PR. thanks! |
although after doing 30 seconds of research i'll backpedal on that... i don't think canonical should necessarily point to the non-version url. canonical is to remove but, i think the fact that the versions pages do not link back to the main page at all currently is probably the main SEO problem, so the other PR as-is will probably make progress. |
Again, caveat: I'm no SEO expert, but... the trouble is if you Canonical tags are designed to de-dupe largely duplicate pages, not just remove query strings etc, meaning most of the value of links to version-specific pages would transfer to the overall gem page. For example, just say you have an index page listing products in a category (typical ecommerce example), you don't want every page indexing as the content is largely the same, you want to centralise that inbound link value and you ideally want to direct users from Google to the first page so they see the latest products etc, so you'd canonicalise page 2..N back to page 1, regardless of the URL structure. |
Codecov Report
@@ Coverage Diff @@
## master #4107 +/- ##
==========================================
+ Coverage 98.86% 98.90% +0.03%
==========================================
Files 275 276 +1
Lines 6259 6273 +14
==========================================
+ Hits 6188 6204 +16
+ Misses 71 69 -2 |
To be honest, I have no idea what's the best practice in here. @jjb would you mind to check how for example npm or pypi do this? One one side I do understand search form |
I'm hoping that #4108 will solve most or all of the problem. before that went live (about 10 days ago?), individual version pages had zero links back to their main gem page. regarding this PR, i think it's true that we don't want a search for [puma 6.4.0] to have the specific version page completely absent from results, so i'll close this PR regarding canonical URLs, from my understanding they aren't made to point to a "main" page, they are made to remove query string cruft, or maybe not have |
maybe one approach is to use "priority" in a sitemap. rubygems.org doesn't currently have a sitemap so this would be a nontrivial project. |
@ryantownsend do you know if i'm wrong on this?
|
@jjb correct, if you use canonicals, the specific version pages would no longer be in the Google results, if you searched for "Puma 5.0.4" you'd just see a result for the "Puma" gem page. The question to me is: how valuable is it to land someone on that specific version page vs landing them on the gem page?
Personally, if I were running the site, I'd just direct everyone to the overall gem page, leaving the version pages linkable for any blogs or sharing links among coworkers etc but canonicialised back to the overall gem page, that way only Google is affected. I'd want to concentrate all the value from backlinks into that one page for better rankings, plus it would reduce the crawler overhead. To most sites this definitely doesn't matter, but you effectively have 12,000 gems (estimated from 402 pages of 30) each with potentially hundreds of versions, so Google is unlikely to crawl and index all of those pages, at least not quickly. This then means you can easily generate a sitemap file that you can add to Google Search Console (with all 12,000 gems, you may be worth breaking down into one sitemap per letter with a sitemap index file). This will mean new gems will be discovered and indexed more promptly, given you're not wasting the crawlers time with all the new gem versions every day. If you decide against canonicalising to the main gem page, you might want to add JSON-LD to the version pages, this will mean Google may show a breadcrumb against the result, giving people the option to jump straight into the gem page too. Gem Page<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [{
"@type": "ListItem",
"position": 1,
"name": "Gems",
"item": "https://rubygems.org/gems"
}, {
"@type": "ListItem",
"position": 2,
"name": "Puma",
"item": "https://rubygems.org/gems/puma"
}]
}
</script> Version Page<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [{
"@type": "ListItem",
"position": 1,
"name": "Gems",
"item": "https://rubygems.org/gems"
}, {
"@type": "ListItem",
"position": 2,
"name": "Puma",
"item": "https://rubygems.org/gems/puma"
}, {
"@type": "ListItem",
"position": 3,
"name": "Version 6.4.0",
"item": "https://rubygems.org/gems/puma/versions/6.4.0"
}]
}
</script> |
@ryantownsend ℹ️ there are 192918 gems with total of 1558031 versions (data from latest dump at https://rubygems.org/pages/data) |
@simi wow, okay. At this volume, I'd personally focus on ensuring all the gem pages are indexed. It would be worthwhile getting Google Search Console set up if you haven't already - it'll tell you how many pages are discovered/indexed. Given there's no sitemap file, I'd bet many gems might even be missing, let alone having all 1.5mil versions indexed. A sitemap would definitely need breaking down into multiple files using a sitemap index file: https://developers.google.com/search/docs/crawling-indexing/sitemaps/large-sitemaps As per Google's size limits, you need max 50MB uncompressed, 50,000 entries per file, so generating 26 alphabetical sitemaps, indexing by the first letter of the gem (36 if you include numbers) is probably going to be easiest. It'll need to be more granular if you wanted to include all versions too though. I'd put this in a periodic job which regenerates each file at least once per day, rather than having an exceptionally slow request flow through the web server. |
personally i agree with @ryantownsend's reasoning for why specific versions don't need to be in search results, but i am just one random rubyist. if someone wants to get to a specific version, it's easy to click on "show all versions" and find it. if we think just nuking all the versions from google is acceptable, then i think no need to implement sitemap, just do it in robots like this closed PR does. ryan's proposal would be better and fancier and offer more options like priorities, but if current status quo is pretty good, no need to add more complication IMO. |
to improve the search engine experience